iterated logarithm
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
An Asymptotic Law of the Iterated Logarithm for $\mathrm{KL}_{\inf}$
The population $\mathrm{KL}_{\inf}$ is a fundamental quantity that appears in lower bounds for (asymptotically) optimal regret of pure-exploration stochastic bandit algorithms, and optimal stopping time of sequential tests. Motivated by this, an empirical $\mathrm{KL}_{\inf}$ statistic is frequently used in the design of (asymptotically) optimal bandit algorithms and sequential tests. While nonasymptotic concentration bounds for the empirical $\mathrm{KL}_{\inf}$ have been developed, their optimality in terms of constants and rates is questionable, and their generality is limited (usually to bounded observations). The fundamental limits of nonasymptotic concentration are often described by the asymptotic fluctuations of the statistics. With that motivation, this paper presents a tight (upper and lower) law of the iterated logarithm for empirical $\mathrm{KL}_{\inf}$ applying to extremely general (unbounded) data.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.86)
A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning
In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, communication, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable. As an application, we show that, for the stepsize $n^{-\gamma}$ with $\gamma \in (0, 1),$ the distributed TD(0) algorithm with linear function approximation has a convergence rate of $O(\sqrt{n^{-\gamma} \ln n })$ a.s.; for the $1/n$ type stepsize, the same is $O(\sqrt{n^{-1} \ln \ln n})$ a.s. These decay rates do not depend on the graph depicting the interactions among the different agents.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning
In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, communication, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense.
Catoni-style Confidence Sequences under Infinite Variance
Bhatt, Sujay, Fang, Guanhua, Li, Ping, Samorodnitsky, Gennady
In this paper, we provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite. Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times, naturally having a wide range of applications. We first establish a lower bound for the width of the Catoni-style confidence sequences for the finite variance case to highlight the looseness of the existing results. Next, we derive tight Catoni-style confidence sequences for data distributions having a relaxed bounded~$p^{th}-$moment, where~$p \in (1,2]$, and strengthen the results for the finite variance case of~$p =2$. The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.
- North America > United States > Washington > King County > Bellevue (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
Sequential Nonparametric Testing with the Law of the Iterated Logarithm
Balsubramani, Akshay, Ramdas, Aaditya
We propose a new algorithmic framework for sequential hypothesis testing with i.i.d. data, which includes A/B testing, nonparametric two-sample testing, and independence testing as special cases. It is novel in several ways: (a) it takes linear time and constant space to compute on the fly, (b) it has the same power guarantee as a non-sequential version of the test with the same computational constraints up to a small factor, and (c) it accesses only as many samples as are required - its stopping time adapts to the unknown difficulty of the problem. All our test statistics are constructed to be zero-mean martingales under the null hypothesis, and the rejection threshold is governed by a uniform non-asymptotic law of the iterated logarithm (LIL). For the case of nonparametric two-sample mean testing, we also provide a finite sample power analysis, and the first non-asymptotic stopping time calculations for this class of problems. We verify our predictions for type I and II errors and stopping times using simulations.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret
Cowan, Wesley, Katehakis, Michael N.
The purpose of this paper is to provide further understanding into the structure of the sequential allocation ("stochastic multi-armed bandit", or MAB) problem by establishing probability one finite horizon bounds and convergence rates for the sample (or "pseudo") regret associated with two simple classes of allocation policies $\pi$. For any slowly increasing function $g$, subject to mild regularity constraints, we construct two policies (the $g$-Forcing, and the $g$-Inflated Sample Mean) that achieve a measure of regret of order $ O(g(n))$ almost surely as $n \to \infty$, bound from above and below. Additionally, almost sure upper and lower bounds on the remainder term are established. In the constructions herein, the function $g$ effectively controls the "exploration" of the classical "exploration/exploitation" tradeoff.
- Information Technology > Artificial Intelligence (0.68)
- Information Technology > Data Science > Data Mining > Big Data (0.66)
Sharp Finite-Time Iterated-Logarithm Martingale Concentration
Martingales are indispensable in studying the temporal dynamics of stochastic processes arising in a multitude of fields [10, 14]. Particularly when such processes have complex long-range dependences, it is often of interest to concentrate martingales uniformly over time. On the theoretical side, a fundamental limit to such concentration is expressed by the law of the iterated logarithm (LIL). However, this only concerns asymptotic behavior. In many applications, it is more natural to instead consider concentration that holds uniformly over all finite times. This manuscript presents such bounds for the large classes of martingales which are addressed by Hoeffding [11] and Bernstein [8] inequalities. These new results are optimal within small constants, and can be viewed as finite-time generalizations of the upper half of the LIL.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York (0.04)
- (3 more...)